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PROJECTION DEPTHi 
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Michigan State University 

As estimators of location parameters, univariate trimmed means 
are well known for their robustness and efficiency. They can serve 
as robust alternatives to the sample mean while possessing high effi- 
ciencies at normal as well as heavy-tailed models. This paper intro- 
duces multidimensional trimmed means based on projection depth 
induced regions. Robustness of these depth trimmed means is inves- 
tigated in terms of the influence function and finite sample breakdown 
point. The infiuence function captures the local robustness whereas 
the breakdown point measures the global robustness of estimators. It 
is found that the projection depth trimmed means are highly robust 
locally as well as globally. Asymptotics of the depth trimmed means 
are investigated via those of the directional radius of the depth in- 
duced regions. The strong consistency, asymptotic representation and 
limiting distribution of the depth trimmed means are obtained. Rel- 
ative to the mean and other leading competitors, the depth trimmed 
means are highly efficient at normal or symmetric models and over- 
whelmingly more efficient when these models are contaminated. Sim- 
ulation studies confirm the validity of the asymptotic efficiency results 
at finite samples. 

1. Introduction. The sample mean is a very standard estimator of the 
"center" of a given data set and possesses many desirable properties. Indeed, 
it is the most efficient estimator at normal models. It, however, is notorious 
for being extremely sensitive to unusual observations (outliers) and heavy- 
tailed distributions. Indeed, the mean possesses the lowest breakdown point. 
To be more robust, the sample median is employed. It has the best break- 
down point among all reasonable location estimators. The median, however. 
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is not efficient at normal and otfier liglit-tailed distributions. Realizing the 
drawbacks of the mean and the median and motivated by robustness and effi- 
ciency considerations, Tukey in 1948 introduced trimmed means in real data 
analysis [28] . These estimators can strike a desirable balance between robust- 
ness and efficiency and serve as compromises between the two extremes — the 
mean and the median. Despite numerous competitors introduced since 1948, 
the robustness and efficiency advantages keep the trimmed mean as the most 
prevailing estimator of location parameters (see, e.g., [2, 3, 11, 27, 30]). 

Data from the real world, however, are often multidimensional and contain 
"outliers" or "heavy tails." The outliers in high dimensions are far more dif- 
ficult to detect or identify than in the univariate case since it is often difficult 
to plot the data and the outliers are not always in the single coordinates. A 
good sample of a real data set of the latter case is given on page 57 of [23]. A 
robust procedure such as multidimensional trimming that can automatically 
detect the outliers or heavy tails is thus desirable. The task of trimming in 
high dimensions, however, turns out to be nontrivial, for there is no natural 
order principle in high dimensions. On the other hand, data depth has shown 
to be a promising tool for providing a center-outward ordering of multidi- 
mensional observations; see [14, 29, 37], for example. Points deep inside a 
data cloud get high depth and those on the outskirts get lower depth. With 
a depth induced ordering, it becomes quite straightforward to define multi- 
variate trimmed means. Indeed, examples are given in [4, 5, 18, 19, 21, 32], 
all for Tukey halfspace depth trimming; in [14] and [6] for Liu simplicial 
depth trimming; and in [34] for general depth trimming (see Section 6 for a 
detailed discussion). 

A natural question raised for depth induced multidimensional trimmed 
means is: Do they share the same robustness and efficiency advantages of 
their univariate counterparts over the sample mean? No answer has been 
given in the literature. Indeed, except for a very few sporadic discussions, 
very little attention has been paid to the depth based multivariate trimmed 
means and little is known about their robustness and efficiency. To answer 
the aforementioned question and to shed light on the robustness and the 
efficiency aspects of a class of depth trimmed means, the projection depth 
trimmed means, is the objective of this article. Although the paper focuses 
on projection depth trimmed means, the technical approaches are applicable 
to other (such as halfspace) depth trimmed means and covariance matrices 
as well. Motivation for selecting projection depth is addressed in Section 6. 

The paper investigates the local as well as the global robustness of the 
depth trimmed means via the influence function and breakdown point, re- 
spectively. Deriving the influence function of the depth trimmed means 
is exceptionally involved. The difficulty lies in handling the distribution- 
dependent depth trimming region. 
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To investigate the large sample behavior (such as asymptotic relative effi- 
ciency) of the sample projection depth trimmed means, we have to establish 
their limiting distributions. The trimming nature of the estimators makes 
the study of the asymptotics very challenging. Standard asymptotic theory 
falls short of the goal. Indeed, even establishing the limiting distribution of 
the regular univariate trimmed means is not as straightforward as one might 
imagine. One misconception about this is that the task should be similar to 
or not much more challenging than the one for the sample mean. In fact, the 
limiting distribution of the regular trimmed means which were introduced 
as early as 1948 by Tukey (or perhaps earlier) was not established until 1965 
by Bickel. Classical textbooks today still do not prove the limiting distribu- 
tion and only point out the ad hoc proof of Bickel [2] or Stigler [26] without 
details. Another misconception about the limiting distribution is that it just 
follows in a straightforward fashion after one derives the influence function. 
This actually is not always the case (as shown here in this paper and else- 
where). The challenging task of establishing limiting distributions in this 
paper for the multidimensional depth trimmed means is accomplished by 
utilizing empirical process theory (see [32] or [20]). 

The paper shows that the projection depth trimmed means (with robust 
choices of univariate location and scale measures) are highly robust locally 
(with bounded influence functions) and globally (with the best breakdown 
point among affine equivariant competitors), as well as highly efficient rela- 
tive to the mean (and depth medians) at normal and heavy-tailed models. 
The latter is especially true when the models are slightly contaminated. 
Findings in the paper indicate that the projection depth trimmed means 
represent very favorable choices, among the leading competitors, of robust 
and efficient location estimators for multivariate data. 

The rest of the paper is organized as follows. Section 2 introduces pro- 
jection depth induced regions and trimmed means and discusses some fun- 
damental properties. Section 3 is devoted to the study of the local (the 
influence function) as well as the global (the finite sample breakdown point) 
robustness of the depth trimmed estimators. Asymptotic representations 
and asymptotics are established in Section 4. Section 5 addresses the effi- 
ciency issue of the projection depth trimmed means. Concluding remarks in 
Section 6 end the main body of the paper. Selected proofs of main results 
and auxiliary lemmas are reserved for the Appendix. 

2. Projection depth regions and trimmed means. 

2.1. Projection depth functions and regions. Let n and a be univariate 
location and scale measures of distributions. Typical examples of fi and a 
include the pair mean and standard deviation (SD) and the pair median 
(Med) and median absolute deviations (MAD). Define the outlyingness of 
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x G R'^ with respect to (w.r.t.) the distribution F of X in M"^ {d > 1) as ([4] 
and [25]) 

(1) 0{x,F)= sup \gix,u,F)\, 

where S*^"^ = {u : = 1}, g{x,u,F) = {u'x — fj,{Fu))/a{Fu) is the "general- 
ized standard deviation" of u'x w.r.t. Fu and F^ is the distribution of u'X . 
If u'x — fJ-iFu) = cr{Fu) = 0, we define the generahzed standard deviation 
g{x,u,F) = 0. The projection depth of x G M'^ w.r.t. the given F, PD{x,F), 
is then defined as 

(2) PD{x,F) = 1/(1 + Oix,F)). 

Sample versions of g{x,u,F), 0{x,F) and PD{x,F) are obtained by re- 
placing F with its empirical version F„. With fi and a being the Med and 
the MAD, respectively, Liu [15] first suggested the use of 1/(1 + 0(x,F„)) 
as a depth function. Zuo and Serfling [37] defined and studied (2) with 
= (Med, MAD). Since PD depends on the choice of {n,cr), a further 
study with general fi and a is carried out in [33]. It turns out PD possesses 
desirable properties for depth functions (see [37]). For example, it is affine 
invariant, maximized at the center of a symmetric distribution, monotoni- 
cally decreasing when a point moves along a ray stemming from the deepest 
point, and vanishes at infinity. For motivation, examples and other related 
discussions of (2), see [33]. 

For any < a < a* = sup^-gj^d PD{x,F) < 1, the ath projection depth re- 
gion is 

(3) PD''{F) = {x: PD{x,F)>a}. 

It is a multivariate analogue of the univariate ath quantile region [F~^{a), 
F~^{1 — a)]. The set {x : PD{x, F) = a} is called the ath projection depth 
contour, which is the boundary dPD"(F) of PD°'{F) under some conditions 
(see [33]). Structural properties and examples of projection depth regions 
and contours are discussed in [33]. Note that a in (3) can also be determined 
by the probability content of the resulting region. For example, define a{\) = 
sup{a:Px{x:PD{x,F) > a) > A}; then Px{PD'^^^\F)) = A for a smooth 
distribution function F. A sample version of PD"'{F), PD'^, is obtained by 
replacing F with its empirical version F„. 

We assume throughout that fj,{FsY+c) = sfi{FY) + c and a{FsY+c) = |'5|cr(Fy) 
{affine equivariance) for any scalars s and c and random variable y S M^, 
and that 

(CO) sup„£5.d-i /u(F„) < oo, < inf„£5.d-i fT(F„) < sup^g^d-i a{Fu) < oo. 
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This holds for typical location and scale functionals; see Remark 2.4 of 
[33]. It follows that PD'^{F) is compact and has a nonempty interior that 
contains the maximum depth point 6 with PD{9, F) = a* (Theorems 2.2 and 
2.3 of [33]). By the affine invariance of the projection depth functions, we can 
assume without loss of generality that = G M'^ in our following discussion. 
The depth region PD°'{F) can then be characterized by the "directional 
radius functional" R"{u,F), 

(4) =sup{r >0: ruG PD"(F)} VnG5'^-\ 

which is the same as inf{r > : rn ^ PD°^{F)}. For simplicity, we sometimes 
write R{u,F) or R{u) for R"{u,F) and Rn{u) for R°'{u,Fn) for fixed a and 
F. 

2.2. Projection depth trimmed means and fundamental properties. With 
depth regions, one can define the ath projection depth trimmed mean (PTM) 

by 

PTM'^{F)= I w{PD{x,F))xdF{x)/ [ w{PD{x, F)) dF{x), 

(5) 

where w{-) is a suitable (bounded) weight function on [0, 1] such that the 
denominator is nonzero. The latter is true for typical nonzero w{-). Note 
that the numerator is bounded since PD"^ is; see Theorem 2.3 of [33]. Thus 
PTM'^{F) is well defined. Again we may suppress a and (or) F in PTM^{F) 
for convenience. 

When w \s a. (nonzero) constant, (5) gives equal nonzero weight to each 
point within the depth region PD^[F), and zero weight to any point outside 
the region. Thus we have exactly the same (0-1) weighting scheme as that 
of the regular univariate trimmed mean. Two such PTM"(F„)'s with w = 
c > are illustrated in Figure 1 with a bivariate standard normal sample of 
size 900 and a = and 0.36. To treat a broader class of multidimensional 
trimmed means, in our following discussion w is allowed to be any suitable 
nonconstant function, though. 

On the other hand, it is noteworthy that in the degenerate one-dimensional 
case (with a nonzero constant w), (5) yields a new type of trimmed mean 
that is different from the regular one. The difference lies in the trimming 
scheme. For example, at the sample level the regular trimming is based on 
the ranks of sample points whereas (5) is based on the values of the gener- 
alized standard deviations. The latter can lead to more robust and efficient 
estimators (see Sections 3.3 and 5). 

It can be seen that PTM"{-) is afhne equivariant, that is, PTM°'{Fj^x+b) = 
A^PTM"' (Fx)) + b for any nonsingular d x d matrix A and b G M'^, since 
PD{x,F) is affine invariant. Hence PTM^ does not depend on the underly- 
ing coordinate system or measurement scale. If X ~ F is centrally symmetric 



6 



Y. ZUO 




.3 -2 -1 I 2 3 -3 -? I 1 a 3 



Fig. 1. PTM°'{F„) based on a N{0,l2) sample of size 900. Left; 
a = and PTM°'{Fn) = (-0.05798,-0.02476). Right; a = 0.36 and 
PTM^iFn) = (-0.05571, -0.05848). 



about e eW^ [i.e., ±{X - 6) have the same distribution], then PTM°'{F) 
is Fisher consistent about 6, that is, PTM"{F) = 6, and PTM°'{Fn) is 
also centraUy symmetric about 6 since PTM°^{AXi + h, . . . ,AXn + h) = 
APTM^'iXa,.. .,Xn) + b for any nonsingular matrix A and b G W'-. The 
latter also implies that PTM"^{Fn) is unbiased for 0. 

3. Robustness. Robustness is a fundamental issue in statistics. It has 
long been recognized as a principal performance criterion for statistical pro- 
cedures. We address the local and the global robustness of depth trimmed 
means in this section. 

One popular (qualitative) robustness measure of a statistical procedure 
is its influence function. Let F be a given distribution, let 5x be the point- 
mass probability distribution at a fixed point x G M*^ and let F{e,5x) = 
(1 — £)F + e6x, £ G [0, 1], be the point-mass contaminated distribution. The 
influence function (IF) of a statistical functional T at x G M'^ for the given 
F is defined as [10] 

(6) IF{x;T,F) = lim {T{F{e,6x)) - T{F))/e, 

which describes the relative effect (influence) on T of an infinitesimal point- 
mass contamination at x, and captures the local robustness of T. A func- 
tional with a bounded infiuence function thus is robust and desirable. The 
supremum norm of IF{x; T, F) is called the gross error sensitivity of T at F 
[10], 

(7) GRE{T,F) = sup \\IF{x;T,F)\\, 

the maximum relative effect on T of an infinitesimal point-mass contamina- 
tion. 

It is well known that the mean functional has an unbounded influence 
function whereas that of the regular univariate trimmed mean functional is 
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bounded; see [24], for example. The natural concern now is whether the influ- 
ence function of the projection depth trimmed mean functional is bounded. 

Note that the integral region in the definition of PTM°'{F) is a func- 
tional of F. An infinitesimal point-mass contamination hence affects this 
region. The derivation of the infiuence function of PTM"'{F) thus becomes 
challenging. Our strategy to attack the problem is "divide and conquer" : to 
work out the influence function of the projection depth region flrst and then 
the influence function of the projection depth region induced trimmed mean 
functional based on the preliminary results. 

3.1. Influence function of depth region. Here we establish the infiuence 
function of R°^{u, F). Denote by 5x) the projected distribution of F(e, 5x) 
to a unit vector u. Then F„ (e, 5x) = iX — £)Fu + eS^'x ■ For simplicity we some- 
times write Fe and for F{e,5x) and Fu{e,5x), respectively, for the fixed 
X € M'^. We need the following itemized conditions. Denote by Ox{i) a quan- 
tity that may depend on a given point x G M*^ but approaches as e — > for 
the fixed x. 

(CI) /i(-) and fj(-) at F^ and F^u are continuous in u G S'^~^ and cr{Fu) > 

0, 

(C2) \fi{Feu)-KFu)\=Ox{l), \a{Feu)-a{Fu)\=Ox{l)nmiovmlymue 

^^3^ ^^iF^i^M)-^^{F^) =IF(u'x;f^,Fu) + Ox{l), "^^"^"''^-^^""^^"^ = ff, 
Fu) + Ox(l) uniformly in u G S'^~^ for fixed x ^W^. 

Conditions (C1)-(C3) hold for smooth M-estimators of location and scale 
(and also for the Med and MAD); see [[12], page 136] and [[33], page 1468]. 
Note that (C0)-(C3) are connected (nested) in the sense that (CI) implies 
(CO), (C3) implies (C2) if IF{u' x; ijl,Fu) and IF{u'x;a,Fu) are bounded in 
u, and (C2) holds when (CI) holds and niF^u) — fJ-{Fu) = Ox(l) and a^F^u) — 
a{Fu) = 0^(1) for any u = u{£) — > uq. The latter holds trivially for continuous 
functionals /i(-) and cr(-), that is, /i(G) — > ^{F) and cr(G) — > a{F) as G 
converges weakly to F. 

When n{Fu) and (t{Fu) are continuous in u and cr{Fu) > 0, there is a unit 
vector v{x) such that g{x,v{x), F) = 0{x,F) for x G W^. With v{x) we can 
drop sup||„||=;^ in the definition of 0{x, F), which greatly facilitates technical 
treatments. Define 

(8) U{x) = {v{x):g{x,v{x),F) = 0{x,F)}, xeM.'^. 

It is usually a singleton (or a finite set) for continuous F and x G dPD'^{F) 
(see comments after Theorem 2). Indeed, to construct a counterexample is 
difficult. In the following we consider the case that U \s a. singleton for the 
sake of convenience. 
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Theorem 1. Assume that IF{v{y)' x\ ^,F^(^y-^) and IF{v{yy x;a,F^,(^y^) 

are continuous in v{y) for y G dPD"^{F) with y/\\y\\ G A ^ S'^~^ and U{y) 
is a singleton for any y G dPD°'{F) . Then under (C1)-(C3) with P{a) = 
(1 — a)/a, 

R^{u,F{e,5^))-R'{u,F) 



e 

(3{a)IF{v{yyx- a, F^^y)) + IF{v{yyx; fi, F„^y)] 



+ 0^(1), 



u'v{y) 

uniformly in u€ A with y = R"(u, F)u. The influence function of R°'{u, F) 
is thus given by the first term on the right-hand side. 

The proof of the theorem, technically very demanding and challenging, is 
given in the Appendix. The influence function of R°'{u, F) at x is determined 
by those of fi and a at v{yyx for the projected distribution F^(^y-^ with 
y = R°'{u,F)u. Since u'v{y) is bounded below from uniformly in u (shown 
in the proof of the theorem), IF{x; R°'{u, F),F) is bounded as long as those 
of fi and a are bounded for F^^^y) . 

The continuity in v{y) of the influence functions of ii and a at the point 
v{yyx for with y G dPD" and y/||y|| G A is important. This and the 

other conditions in the theorem are met with A = S"^~^ by typical smooth 
location and scale measures such as the mean and the standard deviation 
and other M-type location and scale measures (see [12]). They are also met 
by less smooth ones such as the Med and the MAD for suitable (such as 
elliptically symmetric) distributions. A random vector X ^ F \s elliptically 

symmetric about 9 if u'{X — 9) = \J u'T,uZ for u G S'^~^, some positive defi- 
nite matrix S and some random variable Z G with Z = —Z, where "=" 
stands for "equal in distribution." Denote by Fq^y. such a distribution F. 
Assume, w.l.o.g., that ^ = and MAD(Z) = ttiq. 

Corollary 1. Let {f-i,cr) = (Med, MAD) and F = Fq^y, with Z having 
a density that is continuous and positive in small neighborhoods of and 
niQ. Then 

(i?"(^x,F(e,5,))-i?°(n,F))/e 

'/3(a) sign(|n'Il~"'"2;| — ||S~-^/^n||mo) sign(n'S^-^x) \ /,,^_]^/2 n 
4/i,(mo) ^ 2/1,(0) j/" 

+ Occil), 

uniformly in u ^ A, where A is S'^~^ if x = or consists of all u G S'^~^ 
except those u's with u'T,~^x = or u'Tj~^x = ib||S~^/^n||mo. Hence the in- 
fluence function of R'^{u,F), the first term on the right-hand side, is bounded 
for u£ A. 
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By Theorem 1 and Corollary 1, IF{x;R'^{u,F),F) is continuous in v{y) 
with y = R"'{u, F)u ioi uG A and depends on a through /3(a) only. Its exis- 
tence and behavior for u S S'^~^—A are of little interest for IF{x; PTM^{F),F), 
the ultimate goal of all the discussion in this subsection, and not covered by 
the above results. 

The influence function in Corollary 1 is bounded in x G M'^ for any uG A. 
This, however, is not true if we select nonrobust /i and a. For example, if // 
and a are the mean and the standard deviation (SD), then for uG A = S'^~^ 



with (T^ = var(Z), which is no longer bounded in x E M'^. To illustrate 
graphically this influence function and the one in the corollary, we con- 
sider F = N2{0,I) and a = 0.2 for simplicity. By orthogonal equivariance, 
we can just consider uq = (1,0)'. The influence functions for (Med, MAD) 
and (mean, SD) become respectively 

sign(|a;i| -c)//(c) -Fsign(xi)/(2/(0)), 2x1+ xi -2 for x = (xi, X2)', 
with c = $^^(3/4) and / the density of iV(0, 1), which are plotted in Figure 



Figure 2 indicates that IF{x; R°'{u, F), F) with {fi,a) = (Med, MAD) is 
a step function jumping at |xi| = and c and is bounded, whereas with 
(//, a) = (mean, SD) it is continuous everywhere but unbounded. 

Equipped with preliminary results in this subsection, we are now in posi- 
tion to pursue the influence function of the projection depth region induced 
means. 



Fig. 2. The influence functions of i?"(it,F) with F = N2{0,I), a = 0.2 and u = (1,0)'. 
Left; (^t,o-) = (Med, MAD); right; (^,0-) = (mean, SD). 



IF{x;R''{u,F),F) 




2. 
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3.2. Influence function of depth trimmed means. To work out the influ- 
ence function of the depth trimmed mean functional, we need these condi- 
tions: 

(C4) U{y) is a singleton for y G B C PD°'{F) with Pf{PD"'{F) -B) = 0, 
(C5) IF{u'x; fi, Fu) and IF{u'x;a, F^) are bounded in u G S'^^^ and con- 
tinuous in ufov ue {v{y) -.y^B} with fs'i-^-{v{y) yeBndPD'^} du = 0. 

In light of the discussion and examples in the last subsection, (C4)- 
(C5) hold for continuous F and common location and scale functions fi 
and a in general. Under these conditions and by virtue of Theorem 1, 
IF{x; R"{u, F),F) exists for fixed x, a and F and for any u£ A = {y/\\y\\ ■ y G 
B n dPD'^} with Jgd-i_j^ du = 0. Now assume that F^^^ = f and w^^^ exists; 
then we can define for fixed a 

/gd-i {R{u)u - PTM{F))w{a)f{R{u)u)\J{u, R{u))\IF{x; R{u),F) du 



h{x) 
h{x) 
h{x) 



!pD^i,F)^{PD{y,F))dF{y) 

!pD-iF){y - PTM{F))w^^) {PDjy, F))h{x, y) dFjy) 
IpD^^F)HPD{y,F))dF{y) 

{x - PTM{F))w{PD{x,F))I{x G PD°'{F)) 



JpD^^F)MPD{y,F))dF{y) 
where 

0{y, F)IF{v{y)'x- a, F.^y) ) + IF{v{yyx; fi, F,^y)) 



h{x,y) 



a(F,(j,))(l + 0(y,F))2 



and J{u,r) is the Jacobian of the transformation from x to {u,r) G 

S"^~^ X [0,oo). If we let xi = rcos^i,..., Xd-i = r sin^i sin02 • • • 
sin 6 (1^2 cos 9(1-1, x^ = rs'mOi ■ ■ • sin^(^„2 sin^^.i, then u = x/r and J{u,r) = 
r'^-^sm'^-'^9i---smed-2- 

Theorem 2. Assume that F has a density f that is continuous in 
a small neighborhood of dPD"^{F) and w{-) is continuously differentiahle. 
Then under (C1)-(C5), IF{x]PTW{F),F) = h{x) +h{x) + h{x), which 
is hounded as long as the influence functions of and a are hounded at F^ 
for any fixed u. 

Note that ^ and a in the theorem can be very general, including mean 
and SD, Med and MAD, and general M-functionals of location and scale. 
For robust choices, PTM" has a bounded influence function and hence is 
(locally) robust. 

Note that hl{y) usually is a singleton for y G M*^ (with the center of sym- 
metry for symmetric F as an exception). For example, if F = -Fo,S) then 
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hl{y) = for all y 7^ and any affine equivariant and a. The 

condition (C4) in the theorem thus is quite mild. The uniqueness of v{y) 
ensures a unique limit of the counterpart Ve{y) of v{y) as e ^ (see the 
proof of the theorem). The continuity of IF{u'x; n, Fy) and IF{u'x; a, Fu) in 
u for u £ {v{y) :y £ B} is sufficient for invoking the result in Theorem 1 and 
ensures the existence of h{x, y), the influence function of PD{y, F) at point x 
for any y £ B. Conditions in the theorem are usually met by smooth (^,cr)'s 
such as (mean, SD) and also by less smooth ones such as (Med, MAD). 

When w is a nonzero constant (a special yet important case), the influence 
function IF{x; PTM"{F), F) becomes li{x) +/3(x) with both terms greatly 
simplified. 

On the other hand, for specific (/U, a) and F such as (Med, MAD) and 
F = Fg^s, the result in the theorem can be concretized. Since it is readily 
seen that 

(9) IF{x; PTM'',Fe^^) = Y}/^ IF{T.~^'^{x - 6); PTM"" , Fqj), 

we thus will focus on the case 9 = and S = I without loss of generality. 
We have: 

Corollary 2. Let (/U,cr) = (Med, MAD), F = Fqj with density hz of 
Z continuous and > in small neighborhoods of and mo, and let w^^^ be 
continuous. Then 

IF{x;PTM°'{F),F) 

= / c{a)w{a)f{c{a)u)u\J{u,c{a))\IF{x;R{u),F)du 

+ xci/\\x\\ + xw{l/{l + ||a::||/mo))I(||x|| < c(a))^ / cq, 

with CO = /||^||<^(„)U;(1/(1 + ||y||))dFo(y), ci = /||3/||</3(<:,)(mo|?/i|u;(^Hl/ 
(l + ||2/||)))/(2/i,(0)(l + ||y||)2)(iFo(y), y = (yi, . . . , y^)', c(a) = /3(a)mo and 
Fo{y) =F{moy). 

The most desirable feature of an influence function, the boundedness, is 
guaranteed by the corollary. This, of course, is no longer true if we select 
nonrobust such as (mean, SD). To illustrate this, we consider for sim- 

plicity F = A^2(0, /) and a nonzero constant weight function w. The influence 
functions of PTM" for {fi,a) = (mean, SD) and (Med, MAD) in this setting 
at X = {xi,X2y are respectively 

(^^^^ (3^{a)g{f3{a))u{2{u'x)'^ + u'x - 2) dO 

+ xI{\\x\\<f3{a))\/PF{\\y\\<P{a)) 
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Fig. 3. The first coordinate of the influence function IF{x; PTM" (F), F) with 
F = N2{0,I) anda = 0.2. Left; {iJ.,a) = (Med,MAD); right; {fi,ij) = (mean, SD). 

and 

J^^{cf3{a))^g{cf3{a))u{Pia)sign{\u'x\ - c)/{4f{c))+sign{u'x)/{2f{0)))de 

PF{\\y\\<c(3{a)) 

xl{\\x\\ < c/3(a)) 
PFm\<c(3{a)) 

with g{r) = e~^'^ ^"^ / {2tt) and u = (cos ^, sin (and c defined after Corol- 
lary 1), which depend on a through /3(a) only and are plotted in Figure 3 
with a = 0.2. 

Note that the influence functions in this example are two-dimensional and 
the figure plots their first coordinates only. The graphs of the second coor- 
dinates, however, are the same as the ones in the figure up to an orthogonal 
transformation. 

Both influence functions are continuous except at points x with ||3;|| = 
c(3{a) or /?(a). When ||x|| is smaller than these values, the corresponding 
influence functions behave (roughly) linearly in x. The influence of PTM"^ 
with (Med, MAD) is almost zero when ||x|| > c(3{a). However, in the case 
with (mean, SD) it becomes unbounded eventually as ||2;|| — > oo. All these 
are reflected clearly in Figure 3. 

3.3. Finite sample breakdown point. The projection depth trimmed means 
with robust choices of ^ and a have bounded influence functions and thus 
are locally robust. This raises the question as to whether they are also glob- 
ally robust. We now answer this question via the finite sample breakdown 
point, a notion introduced by Donoho and Huber [7] that has become a 
prevailing quantitative measure of global robustness of estimators. Roughly 
speaking, the breakdown point of a location estimator T is the minimum 
fraction of "bad" (or contaminated) points in a data set that can render T 
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beyond any bound. More precisely, the finite sample breakdown point of T 
at the sample X" = {Xi, . . . , X„} in M*^ is defined as 

(10) BP(r„,X")=min(-:sup||r4X;;,)-r„(X")||=ool, 

where denotes a contaminated data set resulting from replacing m orig- 
inal points of with m arbitrary points. For a scale estimator 5, we can 
calculate its breakdown point by treating log S* as T in the above definition. 

Clearly one bad point can ruin the sample mean, hence it has a breakdown 
point 1/n, the lowest possible value. The univariate ath trimmed mean 
(trimming [arej data points at both ends of the data) has a breakdown 
point ([anj + l)/n, which can be much higher than that of the mean. Here 
[•J is the fioor function. So the univariate trimmed means can serve as robust 
alternatives to the sample mean. 

For a projection depth trimmed mean, its breakdown point clearly de- 
pends on the choice of (/x, a) in the definition of PD. Typical robust choices 
of (fJ-jCr) include robust M-estimators of location and scale such as (Med, 
MAD). In the following discussion we first confine attention to the robust 
choice (Med, MAD) and then comment on the general choices of (/i,o"). We 
also modify MAD slightly so that the resulting scale measure is less likely 
to be and consequently the resulting PD trimmed mean has a higher 
breakdown point. Specifically, we use for 1 < < 77, 

MADfc = Medk{\xi - Med{xi}\}, 

Medk{xi} = (2;(L(„+fe)/2j) + 2;(L(„,+fe+i)/2j))/2, 

where rr^^) < • • • < a;^^) are the ordered values oi xi, . . . ,Xn in R^. The same 
idea of modifying MAD to achieve a higher breakdown point for the related 
estimators has been employed in [31], [9] and [33], for example. Note that 
when k = 1, MAD^ is just the regular MAD. 

For projection depth (or any other depth) trimming, an important issue 
in practice is how to determine an appropriate value of a so that PD'^ f] 
is not empty and hence PTM'^ is well defined. It can be shown (based 
on empirical process theory) that -P-D" H X^ is nonempty almost surely 
for suitable a under some mild conditions including Pf{PD°'{F)) > and 
sufficiently large sample size n. 

For univariate data, a "pre-data" approach of determining a value of a 
can be employed in practice. In this case it is not difficult to see that the 
projection depth of the order statistic X(^^(^n+i)/2]) is always no less than 
1/2. Hence PD'^CiX'^ is nonempty as long as a< 1/2. For multidimensional 
data, a "post-data" approach can be adopted. That is, the value of a is 
data-dependent and determined after becomes available. Since we have 
to calculate PD(Xi,Fn) anyhow, an appropriate value of a for the trimming 
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can be determined afterward. Or we may select a data-dependent a so that 
PZ?^ n is nonempty, as is done in the following result. 

A data set X" in R'^ (c^ > 1) is in general position if there are no more 
than d sample points contained in any (d— 1) -dimensional hyperplane. This 
is true almost surely if the sample is from an absolutely continuous distri- 
bution F. We have: 



Theorem 3. Let {fJ,,a) = (Med, MADfc) with k = l for d=l and k = 
d-\- 1 for d> 1 and let he in general position and n> d+ \ for d> 1. 



Then 



[{n+l)/2\ 
n 

[{n-d+l)/2\ 



d = l,0<a< 1/2, 
d > 1, < a < ad, 



n 

where the weight w{r) > for < r < 1 and ad := ad{X^) [d > 1) satisfying 



max 



(11) 

and ii, . . . ,ir are r arbitrary distinct integers from the set {1,2,. 

Note that the denominator on the right-hand side of (11) is bounded 
below from uniformly in u since X" is in general position. Hence ad 
is well defined. It is also seen that Qrf(X") is affine invariant, that is, 
ad{AX"' -|- 6) = ad{X"') for any nonsingular matrix A and b G M*^, where 
AX"" = {AXi,. . . , AXn}. Thus PTM" is affine equivariant. Clearly ad(X") < 
min{i, supj P£)(Xi, Indeed, it is seen that 0{Xi,X^) is no greater 

than the right-hand side of (11) for any original Xi E X" and any < m < 
[(n -d+l)/2\- 1. Hence PTW for d > 1 is well defined. 

The main idea of the proof can be briefly explained as follows. The esti- 
mator breaks down only if PD'^{X^) is empty or contains points of X^ with 
arbitrarily large norms. This cannot happen unless and (or) a break(s) 
down. To break down Med, [{n + l)/2\/n contaminating points are needed. 
With the contaminating points m = [(n — d + l)/2j in M*^ {d> 1), we can 
force MADd+i{u'X'^) for the special u to be zero or unbounded. All these 
can lead to the breakdown of PTM°' . 

The breakdown point results in the theorem are striking. In R^, PTM"" 
for any a £ (0,1/2] achieves the best breakdown point of any translation 
equivariant location estimators (see [17]). Note that the breakdown point of 
the regular ath trimmed mean is only ([anj -|- l)/n, which is lower unless 
a ~ 0.50 (which corresponds to the median). The difference in breakdown 
point between the two types of trimmed means is due to the difference in 
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trimming. In the projection depth trimming case, trimming is done based on 
the values of the \Xi — Med(X"')|/MAD(X")'s, while in the regular trimming 
(equivalent to Tukey halfspace depth trimming) case, it is done based on the 
ranks of the Xj's. 

In {d > 1), the breakdown point, [(n — d + l)/2j/n, is also the best 
(highest) among existing affine equivariant location estimators, with very 
few exceptions. 

Note that the theorem allows one to select very small a values (e.g., 0.05 or 
0.10), which then can lead to very high efficiency for PTM°' at (contami- 
nated) normal models (see Section 5), while enjoying very high breakdown 
point robustness. 

For simplicity, the theorem reports only the best breakdown points with 
the corresponding d and k. For general k and d, it can be shown that 
BP(PrM",X") is [{n - k + 2)/2\/n for d=l and min{[(n + A; + 1 - 
2d)/2\ , [(n — k + 2)/2j } /n for d > 1. The theorem can be extended for arbi- 
trary X". In this case, the BP results still hold if d is replaced by c(X"), the 
maximum number of sample points contained in any (d — 1) -dimensional 
hyperplane. The theorem considers robust choices of and a. It can be 
extended for more general cases. For example, if (mean, SD) is used, then 
BP(PTM",X") is 1/n. For general (/i,cr), the BP of PTW is no less than 
the minimum of the BPs of and a at u'X"' for arbitrary u G S'^~^. 

4. Asymptotics. This section investigates the large sample behavior of 
the sample projection depth trimmed means. We focus on the strong con- 
sistency and the asymptotic normality of the estimators. To this end, we 
(have to) characterize the asymptotic behavior of the random convex and 
compact set PD°^{Fn), the sample projection depth region, via that of the 
random directional radius R°^{u,Fn). 

4.1. Strong consistency and asymptotic representation of the directional 
radius. Denote by F„„ the empirical distribution function of u'Xi, i = 
l,...,n, for iiG5'^~^. The following counterparts of (CI) and (C2) are 
needed in the sequel: 

(Cl') /i(-) and (t(-) at Fu and Fnu are continuous in u G S"^~^ and a{Fu) > 

0, 

(C2') SUP||„||=i \n{Fnu) - KPu)\ = o(l), SUp||„||=i \a{Fnu) - cr{Fu)\ = o(l), 
a.s., 

which hold for common choices of (/i,o") and a wide range of distributions 
F; see Remark 2.4 of [33] for a detailed account (also see [35]). Indeed, 
for general M-estimators fi and a including (Med, MAD), (Cl') holds for 
suitable F (see Lemma 5.1 of [33]), which in turn implies (C2') if iJL{Fnu„) 
and (T[Fnu„) are strongly consistent for fi{Fu) and o-{Fu), respectively, for 




Fig. 4. R!'''^{u,F) {solid circle) and i?"'^(it, F^) (boundary of the shaded region) for 
F = N2{0,I). Upper; left—n = 100, right— n = 200. Lower/ left—n = 300, right— n = 900. 



any Un ^ u ^ S . The latter is true for typical ft and a since fi{G) and 
cr{G) are typically continuous in G in the sense that n{G*) and o"(G*) — > 
and cr{G), respectively, whenever G* becomes close enough to G in 
distribution (or in Smirnov-Kolmogorov distance) sense (see Example II. 1 
of [20] for the median functional). We have 

Theorem 4. Under (C1')-(C2'), sup^^g^-i \R°'{u,Fn) - R"{u,F)\ = 
0(1), a.s. 

The main idea of the proof is as follows. Condition (Cl') insures that for 
a fixed X G M'^ there are unit vectors v{x) and Vnix) such that 0{x,F) = 
g{x,v{x),F) and 0{x,Fn) = g{x,Vn{x), Fn) [see (1)]. This result enables us 
to bound R"'{u, Fn) — i?"(u, F) from above and below for any fixed u G S"^~^ . 
Both the upper and the lower bounds are then shown to be o(l) almost surely 
and uniformly in u G S"^~^. A crucial step for this is to show that x'v{x) and 
x'vn{x) are bounded below from uniformly for any x on the boundary of 
PD°'{F) and PD°'{Fn), respectively 

The uniform strong consistency property of R'^{u, Fn) is illustrated in Fig- 
ure 4. Here R°'{u,F) and R^{u,Fn) are plotted for a = 0.5 and different n's. 
For simplicity, F = N2{0,I) is selected. R"{u,F) then is the circle with ra- 
dius <I>~^(3/4). The boundary of PD"{Fn) is i?"(ti,-F„). The uniform strong 



PROJECTION DEPTH TRIMMED MEANS 



17 



consistency is clearly demonstrated as sup||„[[=;^ |i2°(u,F„) — R°'{u,F)\ gets 
smaller when n gets larger. 

Remark 4.1. Under some (stronger) conditions on F, PD°^{F) are con- 
tinuous in Hausdorff distance sense, that is, p{PD'^,PD°^°) — > as a — > a^, 
where p{A,B) =inf{e|e > 0,^ C 5*^,5 C A^} and = {x|inf{||3; -y\\:ye 
C} < e} (see Theorem 2.5 of [33]). With this continuity of the depth regions, 
the result in the theorem can be established in a straightforward fashion. 
For the halfspace depth regions and assuming this continuity, Nolan [19] first 
obtained the strong consistency result for the radius of the halfspace depth 
region. 

To establish the normality of R{u, Fn), the counterpart of (C3) is needed: 
(C3') The asymptotic representations hold uniformly in u: 

1 " 



KFnu) - fJ'iP'u) = - V/l(Xi,u) +Op 



1=1 



1 



cr{Fnu) - cr{Fu) = -^/2(Xi,M) + Op{n 



-l/2x 



1=1 



The graphs of functions in {/, (•, u) : n G S'^~^} form a polynomial discrimina- 
tion class, Efj{X,u) = for li G S'^~^ , E{s\xp\\^\\^i fj{X,u)) < oo, for j = 1 
or 2, and 



E 



sup \fj{X,Ui)-fj(X,U2) 

\ui—U2\<S 



as S ^0, j = 1,2. 



For the definition of a class of sets with polynomial discrimination, see [20]. 
Condition (C3') holds for general M-estimators of {fj,, a) including (Med, 
MAD) and a wide range of F; see [35] for detailed accounts. For example, 
when {n,a) = (Mean, SD) and E\\X\\^ exists, then fi{X, u) =u'{X - EX) 
and /2(X,n) = u'{{X - EX){X - EX)' - cov{X))u/{2^u'coy{X)u) and 
(C3') holds. 

Theorem 5. LetU{x) he a singleton for x e dPD"{F). Under (CI')- 
(C3'), 

1 " 

i?"('u,F„) - R''(u,F) = - Y k(X.„ R'^lu, F)u) + oJn-^/^) a.s. 

n 

1=1 

uniformly in u ^ S"*"^, where k{x,y) = {f3{a)f2{x,v{y))+fi{x,v{y)))/ 
{y'oviy)), for any yo = y/\\y\\ with y / 0. Hence 

{V^{R''{u,Fn) - R''{u,F)):ue S^-^} {Z'^{u):ue S"^-^], 
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with Z"{u) being a zero-mean Gaussian process on the unit sphere with 
covariance structure E[k{X, (ui, F)ui)k{X , R°'{u2, F)u2)] for unit vectors 
ui and U2- 

By virtue of the lower and upper bounds for R°'{u, Fn) — R"{u, F) estab- 
lished in the proof of Theorem 4 and thanks to empirical process theory 
(see [32] or [20]), the asymptotic representation for R°'{u,Fn) is obtained 
after we show that Vn{R°'{u, Fn)u) converges to v{R"{u,F)u) uniformly in 
u G S'^~^. The directional radius R'^{u, Fn) thus is asymptotically normal for 
fixed u G S'^~^ and also converges as a process to a Gaussian process indexed 
by u G S*^"^. Conditions in the theorem are met by typical M-estimators of 
location and scale and a wide range of distribution functions F. For specific 
{n,cr), we have specific k{x,y). For example, let (/U,o") = (Med, MAD) and 
F = Fq y,; then the following holds. 

Corollary 3. Let {n,cr) = (Med, MAD), F = Fo_e with the density 
hz of Z continuous and > in small neighborhoods of and tuq. Then 
Theorem 5 holds with 



andv{y) = {T. ^y)/||i; ^y\\ for any y 0. 

The proof of this result is skipped. For related discussion, see Lemma 5.1 
of [33] and Lemma 3.2 of [35]. Equipped with the results on R"{u,Fn), we 
now are in position to discuss the asymptotics of the depth trimmed means. 

4.2. Strong consistency and asymptotic representation of depth trimmed 
means. Strong consistency holds for PTM°'{Fn) under very mild conditions 
for a <a* . 

Theorem 6. Let f^{Fu) and (t{Fu) be continuous in u and (t{Fu) > 
for u G S"^~^ and let w^^'{-) be continuous. Then under (C2'), PTM{Fn) — 
PTM{F) = o{l), a.s. 

Again the theorem focuses on strong consistency. Other types of consis- 
tency can be established accordingly under appropriate versions of (C2'). 
Note that the weight function in the theorem can be a nonzero constant. 

With a standard means, the proof of the theorem seems challenging. The 
difficulty lies in handling the integral region PD'^{Fn) [or integrand con- 
taining L{x G PD°'{Fn))] in (5). The problem becomes less difficult with the 
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help of empirical process theory. The main tool employed in our proof is the 
generalized Glivenko-Cantelli theorem for a class of measurable functions 
whose graphs form a class with polynomial discrimination (see II. 5 of [20]), 
or a Glivenko-Cantelli class of measurable functions (see [32]). 

We now establish the limiting distribution of PTM"(F„) via an asymp- 
totic representation. Assume that F^^^ = f and w^^^ exist. Replace IF(x, 
i?° (u, F),F), IF{v{y)'x, ^l, F.^^y) ) and IF{v{y)'x, a, F,(^y^ ) with k{x,R"{u, F)u) , 
fi{x,v{y)) and f2{x,v{y)), respectively, in li{x) and h{x,y) and call the re- 
sulting functions li{x), i = 1,2,3, and h{x,y), respectively. We have: 

Theorem 7. Assume that f is continuous in a small neighborhood of 
dPD°'{F), PF{dPD'^{F)) =0 and 'u;(^) is continuous. Then under (Cl')- 
(C3') and (C4) 

PTM^{Fn) - PTM^{F) = - V(/i(X,) + hiXi) + k{Xi)) + Op{l/V^). 

n 

1=1 

Thus V^(PrM"(F„) - PTM"{F)) Nd{0,V), where V = cov(/~i(X) + 
l2iX)+kiX)). 

With standard tools, it seems extremely challenging to establish the asymp- 
totic representation and the normality of PTM{Fn). Thanks to empirical 
process theory for a Donsker class of functions, especially the asymptotic 
tightness of the sequence of empirical processes and the asymptotic equicon- 
tinuity result and the central limit theorem for the empirical process indexed 
by a class of functions (see [20] or [32]), we are able to tackle the problem. 
One key step in our proof is to characterize the complicated integral region 
PD°'{G) via the directional radius function R"(u,G) for G = F and F^. 

With a nonzero constant PTM"^{Fn) becomes a depth trimmed mean 
with equal weight assigned to each sample point within PD°'{Fn). The rep- 
resentation is simplified since l2{x) vanishes and /i(x) and /3(x) also become 
less complicated. 

Conditions in the theorem are met by typical M-estimators of location 
and scale and a wide range of distributions F. For example, when (//, a) = 
(Med, MAD) and F is elliptically symmetric about the origin (assume T, = I, 
w.l.o.g.), we have: 

Corollary 4. Let {fi,a) = (Med, MAD) and F = Fqj with F' = f be- 
ing continuous in a small neighborhood of \\y\\ = f3{a)mQ = c(a) and the 
density hz of Z being continuous and positive in small neighborhoods of 
and niQ. Let w^^^ be continuous. Then conditions in Theorem 7 hold and 



20 



Y. ZUO 




h(x) + hix) + kix) is 

I / c(a)w{a)f(c(a)u)\J(u,c{a))\k(x;c{a)u)udu 

Ikll + lkll/"io))/'^°' 

where cq and ci are defined in Corollary 2 and k{x;c{a)u) = /3(a)(^ — 
Ii\u'x\ < mo))/(2/i,(mo)) + (i - /(u'x < 0))//i,(0). 

The asymptotic normality in Theorem 7 and Corollary 4 is illustrated 
in Figure 5. Here 2000 PTM"(F„)'s are obtained based on A^2(0,I) with 
n = 300 and a = 0.36. The two-dimensional histogram indicates a (roughly) 
normal shape, and so do the one-dimensional histograms of the x- and y- 
coordinates of the PTM^'iFnYs. 

5. Efficiency. Besides robustness, efficiency is another fundamental issue 
in statistics. It is also a key performance criterion for any statistical proce- 
dure. Section 3 reveals that PTM" is robust locally and globally for suitable 
choices of /u and a. A natural question is: Is PTM"^ also highly efficient at 
normal and other models? This section answers the question at both large 
and finite samples. 
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5.1. Large sample relative efficiency. Consider for simplicity the case 
that (iJ'jCr) = (Med, MAD) and w = c> 0. Following the convention in the 
location setting, assume that F = Fq-^. By affine equivariance, assume, 
w.l.o.g., that = and S = /. Furthermore, assume that F' = f and the 
density hz of Z is continuous and positive at and rriQ. By Theorems 7 and 
5 and Corollaries 4 and 3, we have 

Corollary 5. Let {fj.,a) = (Med, MAD) and F = Fqj meet the condi- 
tions in Corollary 4. Let w = c^ 0. Then results in Theorem 4 hold with 
h{x) = and 

h{x)+ls{x) 

Js<i-i c{a)f{c{a)u)\J{u, c{a)\k{x, c{a)u)udu + xl(||x|| < c(a)) 
= P{\\X\\<c{a)) ' 

where k{x,c{a)u) = (3{a)sign{\u'x\ —mo)/{'ihz{mQ)) + sign{u'x)/{2hz{0)) and 
VH(PTM"(F„) - PTM"{F)) ^ iVrf(0, V), where for X = (Xi, . . . ,Xrf)' 

_ ^(Xi/(||X|| < c(a)) + Jsd-i c{a)f{c{a)u)\J{u, c{a)\k{X, c{a)u)ui du)f 
- P^{\\X\\<c{a)) 

X Idxd- 

The key ingredient of the proof of the corollary is repeatedly taking ad- 
vantage of the symmetry of F in a nontrivial (and clever) manner. The 
proof, albeit not very challenging technically, is quite involved and hence is 
skipped. 

The explicit form of V greatly facilitates the calculation of the asymptotic 
relative efficiency of PTM"{Fn). Note that EXiI{\\X\\ < c{a)) sign{\u'X\ - 
tuq) = 0, which further simplifies the calculation. Call the denominator of V 
a and the numerator b; then V = b/aldxd- Hence the asymptotic efficiency 
of PTM"{Fn) relative to the sample mean is {aa1)/b with cj^ = var(Z). 
For X ~ iVd(0,/), we have fJ^ = 1, a = {P{T < c^Ja)))^ with T ~ x^id), 
/(c(a)n) = g{c{a)) = e"^'(")/V(27r)'^/2 and mo = $"^3/4). When d = 2, 
u = {cos{e),sm{e)), a = (l-e-^'(")/2)2 ^^^^^ 

b = e(^XiL{\\X\\ < c{a)) + c\a)g{cia))kiX, c{a)u) cos{9) d9 

In Table 1 we list the asymptotic relative efficiency (ARE) results of 
PTM°' for different a's of the Stahel-Donoho estimator (see [35]) and of 
the halfspace median (HM) and the projection median (PM) (see [33]) at 
iV2(0,/). 

It is seen that PTM" is highly efficient for small a's and is much more 
efficient than some leading competitors. Replacing Med in PTM"^ (and PM) 
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Table 1 

ARE of depth trimmed means and medians relative to the mean 



prpjyjO.OS 








SD 


PM 


HM 


Mean 


0.9990 


0.9981 


0.9927 


0.8856 


0.935 


0.77 


0.76 


1.00 



with a more efficient one at normal (and otlier) models, one can improve the 
efficiencies of PTM" (and PM); see [33] for discussion related to PM. Our 
calculations indicate that when the "tail" of F get heavier, PTM°^ can gets 
more efficient than the mean. Furthermore, when d increases, the ARE of 
PTM°^, as expected, increases. 

5.2. Finite sample relative efficiency. The comparisons of the relative 
efficiency results in the last subsection have all been asymptotic, and this 
raises the question as to whether they are relevant at finite sample prac- 
tice. Asymptotic results indeed are quite vulnerable to criticism about their 
practical merits. We now address this issue in this subsection through finite 
sample Monte Carlo studies. 

To see how PTM" performs in a neighborhood of a normal model, we gen- 
erate m = 1000 samples for different sizes n from the model {1 — £)N2{{0, 0)', I) + 
eN2{{^J., fJ,y ,cr'^I) with fi = 10 and a = 5 and e = 0.0, 0.1 and 0.2. For sim- 
plicity, we just consider the case a = 0.1. Included in our study are Stahel- 
Donoho (SD) [35], PM [33] and HM estimators. We assume that all the 
estimators aim at estimating the known location parameter 6 = (0,0)' £ E?. 

For an estimator T we calculate its "empirical mean squared error" (EMSE) 
— 6\\'^ /m, where Tj is the estimate based on the ith sample. The 
relative efficiency (RE) of T w.r.t. the mean is obtained by dividing the 
EMSE of the mean by that of T. Here {fi,a) = (Med, MAD), w = c^O. 
Tables 2-4 list some efficiency results relative to the mean. The entries in 
parentheses are EMSE x 10^. 

Table 2 reveals that for a perfect N2{0,I) model PTM^'^ is extremely 
(and the most) efficient. The consistency of RE's with the ARE's confirms 
the validity of the results in Table 1. The SD estimator is the second most 
(about 93%) efficient and the PM and HM with roughly the same efficiency 
are the least efficient ones. 

In practice, data more often than not follow a model that is not per- 
fectly normal. Typical examples include contaminated normal (or mixture 
normal) models. This raises the question of the practical relevance (or ro- 
bustness) of the results in Table 2. Tables 3 and 4 indicate that PTM°-^ 
has very (most) robust EMSE's. Indeed under e = 0.1 and 0.2, the EMSE's 
of PTM^'^ are still very close to those with e = 0.0. This robustness in- 
creasingly degenerates for SD, PM and HM. The mean has the least robust 
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Table 2 





Finite sample 


efficiency of PTM" relative 


to the sample 


fnean 






iV2((0,0)',7) 






n 




SB 


PM 


HM 


Mean 


20 


0.9843 


0.9381 


0.7999 


0.8053 


1.0000 




(104.24) 


(109.38) 


(128.27) 


(127.42) 


(102.61) 


40 


0.9985 


0.9298 


0.7822 


0.7732 


1.0000 




(50.560) 


(54.299) 


(64.546) 


(65.296) 


(50.485) 


60 


0.9984 


0.9347 


0.7675 


0.7671 


1.0000 




(32.941) 


(35.187) 


(42.850) 


(42.873) 


(32.889) 


80 


1.0000 


0.9387 


0.7782 


0.7762 


1.0000 




(25.146) 


(26.787) 


(32.314) 


(32.398) 


(25.146) 


100 


0.9995 


0.9338 


0.7762 


0.7645 


1.0000 




(20.014) 


(21.421) 


(25.770) 


(26.061) 


(20.003) 



EMSE's. Indeed, the EMSE's of the mean change drastically (enlarged 100 
times or more) under the contaminations. With slight departures from nor- 
mality, all the depth estimators become overwhelmingly more efficient than 
the mean while PTM*^'^ performs substantially better than its competitors. 
The results here for SD, PM and HM are very consistent with those in [35] 
and [33]. 

Our simulation studies indicate that the above findings also hold true 
for other nonnormal (such as t, double-exponential and logistic) models. 
Furthermore, the relative efficiency of PTM" increases as the dimension d 
increases. 

Table 3 

Finite sample efficiency of PTM" relative to the sample mean 



0.90Ar2((0,0)',7) + 0.10iV2((10, 10)', 257) 



n 




SD 


PM 


HM 


Mean 


20 


19.688 


17.746 


14.392 


13.851 


1.0000 




(121.34) 


(134.61) 


(165.99) 


(172.47) 


(2388.9) 


40 


37.455 


29.716 


21.775 


21.958 


1.0000 




(58.615) 


(73.878) 


(100.82) 


(99.980) 


(2195.4) 


60 


54.039 


39.880 


27.763 


27.848 


1.0000 




(39.687) 


(53.778) 


(77.249) 


(77.014) 


(2144.7) 


80 


71.694 


49.216 


32.799 


32.500 


1.0000 




(29.620) 


(43.149) 


(64.746) 


(65.342) 


(2123.6) 


100 


83.543 


53.948 


35.536 


35.198 


1.0000 




(24.760) 


(38.343) 


(58.210) 


(58.770) 


(2068.6) 
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Table 4 

Finite sample ejjiciency of PTM" relative to the sample mean 
0.80JV2((0, 0)', 7) + 0.20iV2((10, 10)', 257) 



n 




SD 


PM 


HM 


Mean 


20 


51.779 


37.573 


28.653 


27.416 


1.0000 




(167.26) 


(230.50) 


(302.25) 


(315.89) 


(8660.5) 


40 


89.745 


52.678 


35.790 


34.733 


1.0000 




(93.209) 


(158.80) 


(233.73) 


(240.84) 


(8356.0) 


60 


121.86 


62.637 


40.018 


39.876 


1.0000 




(68.058) 


(132.41) 


(207.24) 


(207.98) 


(8293.4) 


80 


155.80 


68.658 


43.001 


42.364 


1.0000 




(52.973) 


(120.21) 


(191.93) 


(194.82) 


(8253.1) 


100 


176.39 


71.282 


43.807 


43.385 


1.0000 




(46.144) 


(114.18) 


(185.80) 


(187.60) 


(8139.3) 



6. Concluding remarks. We now account for the motivation of selecting 
projection depth for multivariate trimming, review some related trimmed 
means and studies in the literature, address the computing issues, discuss 
some practical choices of a values and summarize the major results obtained 
in this paper. 

6.1. Why projection depth trimmed means? There are a number of depth 
notions in the literature; see [16], for example. For any given notion, one 
can define and study the corresponding depth trimmed means. Among the 
existing notions, the projection depth represents a very favorable one; see 
[33, 34, 37]. Tukey halfspace depth, also built based on projection pursuit 
methodology, is its major competitor. The projection depth, as a center- 
outward strictly monotone function, conveys more information about data 
points than the halfspace depth, a center-outward step function, does. As 
a matter of fact, the projection depth and its induced estimators can out- 
perform the halfspace depth and its induced estimators with respect to two 
central performance criteria: robustness and efficiency. For example, the half- 
space depth itself is much less robust than the projection depth (with appro- 
priate choices of n and a). Indeed, the former has the lowest breakdown point 
(~ 0) whereas the latter can have the highest breakdown point (~ 1/2); see 
[34]. The estimators induced from the halfspace depth are also less robust 
than those from the projection depth. For example, the breakdown point 
of the halfspace median is < 1/3 [6] whereas that of the projection median 
can be about 1/2 [33], the highest among all affine equivariant location esti- 
mators in high dimensions. The breakdown point of the ath trimmed mean 
based on the halfspace depth is ([anj + l)/n whereas that of the one based 
on the projection depth is [{n — d + l)/2\/n (Theorem 3). On the other 
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hand, the efficiency of the bivariate halfspace median relative to the mean 
is about 77% whereas that of the projection median can be as high as 95%; 
see [33]. The projection depth trimmed means can also be more efficient 
than those based on the halfspace depth. This is especially true when the 
underlying model slightly deviates from the assumed symmetric one (such 
as in the contaminated normal model cases). The robustness and efficiency 
advantages motivate us to focus on projection depth trimming. Note that 
with general choices of (/.i, cr) we deal with a class of depth trimmed means 
instead of a single one as in the halfspace depth case. This is yet another 
motivation for projection depth trimming. The approaches and techniques 
in this paper, however, are applicable to other depth trimmed means and 
covariance matrices as well. 

6.2. Related estimators and studies in the literature. First we note that 
by combining the integral regions with the integrands, (5) can be trivially 
written as 

(12) PTM''{F) = J w*{PD{x,F))xdF{x)/ J w*{PD{x,F))dF{x), 

with w*{s) = w{s)I{s > a). Indeed we use this form repeatedly in the proofs. 
We adopt (5) [not (12)] since it is consistent with the regular univari- 
ate trimmed mean definition and manifests the depth trimming idea more 
clearly. Depth trimmed means with the form (12) have been discussed by 
Diimbgen [8] for simplicial depth, by Masse [18] for halfspace depth and 
by Zuo, Cui and He [35] for general depth. These discussions, however, are 
based on the assumption that w*{s) is continuously differentiable, which 
straightforwardly excludes (12) with w*{s) =w{s)I{s >a). The difference 
here between continuous differentiability on a closed interval and disconti- 
nuity, seemingly very minor since it is understood that one can approximate 
the discontinuous function by a sequence of continuous differentiable ones, 
turns out to be crucial. The immediate problem with the sequence approach 
is the unbounded derivatives of the approximating functions. Boundedness 
is essential in the treatments of Diimbgen [8], Masse [18] and Zuo, Cui and 
He [35]. To deal with (alleviate) the unboundedness effect one is (essentially) 
forced to construct a random sequence depending on the convergence rate of 
the process y/n[PD{x,Fn) — PD{x,F)). This, however, seems infeasible, or 
the halfspace depth median would have been asymptotically normal, which 
is not true, as shown in [1]. 

Note that with a nonzero constant w, (5) admits a 0-1 trimming scheme, 
which is the one used in the regular (univariate) trimming. This, however, is 
not the case with Diimbgen [8], Masse [18] and Zuo, Cui and He [35], where 
a continuous differentiable w* is assumed. This is yet another difference 
between this and the other papers. 
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Now in one dimension with the same 0-1 trimming scheme, this paper (5), 
introduces a new type of trimmed mean that is different from the regular 
univariate trimmed mean (which corresponds to halfspace depth trimming) 
as weh as the metrically trimmed mean (see [2, 13]). Indeed, the trimming 
in this paper is based on the "generalized standardized deviation" (the out- 
lyingness), whereas the regular trimming is based on the ranks of sample 
points. The metrically trimmed mean uses deviations to trim. But as in the 
regular trimming case, it always trims a fixed fraction of sample points. The 
projection depth trimming in this paper trims sample points only when they 
are "bad." The advantages of this trimming scheme include the gain in ro- 
bustness (see comments after Theorem 3) and in efficiency for models which 
slightly deviate from the assumed symmetric ones. 

Based on halfspace depth, Donoho and Gasko [5] introduced a trimmed 
mean [corresponding to (5) with a nonzero constant w and dF{x) replaced 
by dx] and studied its breakdown point; Nolan [19] and van der Vaart 
and Wellner [32] studied the asymptotic normality and the Hadamard- 
differentiability, respectively, of the same estimator. When introducing the 
notion of simplicial depth, Liu [14] also defined a depth trimmed mean, 
which is not based on depth regions, though. 

6.3. Computing projection depth trimmed means. Like all other affine 
equivariant high-breakdown procedures, the projection depth trimmed means 
are computationally intensive. Exact computing in high dimensions, though 
possible (and an algorithm for two-dimensional data exists), is infeasible. 
Approximate computing is much faster and quite reliable and is sufficient in 
most applications. Basic approaches include randomly selecting projection 
directions or selecting among those perpendicular to the hyperplanes con- 
taining d data points. Feasible approximate algorithms for high-dimensional 
data exist and are utilized in this paper. 

6.4. Choice of a values. A very legitimate practical concern for PTM"^ 
is the choice of the a value. Empirical evidence indicates that an a value 
around 0.05 to 0.1 can lead to a very efficient PTM"' at both light- and 
heavy-tailed distributions. One, of course, might also adopt an adaptive 
data-driven approach to determine an appropriate a value. For a given data 
set, an a value is determined based on the behavior of the tail of the data set. 
Generally speaking, a small value of a (e.g., 0.05) is selected for light-tailed 
data while a larger value is selected for a heavy-tailed one. 

6.5. Main results obtained in the paper. This paper introduces projection 
depth trimmed means, examines their performance with respect to two prin- 
cipal criteria, robustness and efficiency, and establishes their limiting distri- 
bution via asymptotic representations. It turns out that the depth trimmed 
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means can be highly robust locahy (with bounded influence functions) as 
well as globally (with the best breakdown point among affine equivariant 
competitors). Robustness and efficiency do not work in tandem in general. 
Results obtained in the paper indicate, however, that the depth trimmed 
means, unlike the mean and the (halfspace) depth median, can keep a very 
good balance between the two. At normal and other light-tailed symmet- 
ric models, they (with high relative efficiency for suitable a's) are better 
choices than the depth medians and strong competitors to the sample mean, 
which is the best (in terms of efficiency) at the normal model. At contam- 
inated (normal or symmetric) models (the more realistic ones in practice) 
and heavy-tailed models, they, with very robust and overwhelmingly high 
efficiency, are much better choices than the sample mean and other depth 
competitors. As a by-product, this paper introduces a new type of trimmed 
mean in one dimension which can have advantages over the regular and met- 
rically trimmed means with respect to the two central performance criteria, 
robustness and efficiency. 

APPENDIX: SELECTED PROOFS AND AUXILIARY LEMMAS 

Proof of Theorem 1. To prove the theorem, we need the following 
lemmas. 

Lemma A.l. Under (CO) and (C2) and for fixed x M."^ and very small 
e>0: 

(a) PD{y,F) and PD{y, F{e,6x)) are Lipschitz continuous in y G M'^; 

(b) supj,gK<^(l + \\y\\)\PD{y,Fie,6,)) - PDiy, F)\ = o,{l); 

(c) dPD''{G)={y:PD{y,G) = a} forO<a<a* and G = F orF{e,5^); 

(d) PD'^+'i{F) C PZ)"+'?/2(i?(£^5^)) c PD°'{F) for any < r] < a* - a. 

Proof. Conditions (CO) and (C2) imply that sup||„||=i |/i(F£„)| and 
sup||„ll=;^ (7(Feu) are finite for sufficiently small e > 0. It is readily seen that 
I inf||„||=i cr{Feu) - inf||„||=i (t(F„)| < sup||„||=i \a{Fsu) - o-(F„)| = Ox{l)- This, 
together with (CO), implies that infy^n a{Fi.u) is bounded below from for 
fixed X and sufficiently small e. Hence for sufficiently small e > 0, 

(CO') SUP||„||=i n{Feu) < OO, < inf||„|| a{Feu) < SUpy^n^i (7{Feu) < oo. 

The Lipschitz continuity of PD{-,F) can be established by following the 
proof of Theorem 2.2 of [33]. For that of PD{-,F^), we observe that for 
small e > 

\PD{yi,F,) - PD{y2,F,)\ < \0{yi,F,) - 0{y2,F,)\ 

< \\yi -y2||/ inf (7{Fsu)- 

\\u\\=l 
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This and (CO') lead to the Lipschitz continuity of PD{-,F^). Part (a) fohows. 
To show part (b), first we observe that 

il + \\y\\)\PD{y,F,)-PD{y,F)\ 

i + l|y|| 



< 



< 



l + 0{y,Fe) 

0(y,F)sup||„||=i \a{Feu) - (t{Fu)\ +sup||„||=i - K^u) 

inf|H|=ia(F,„)(l + 0(y,F)) 

i + l|y|| 



1 + ll|y|| - tJ'{Fey^)\/{^^V\\u\\'^{Feu)) 

sup||,^||=i \(y{Feu) - (y{Fu)\ + 



X 



inf||„||=iO-(Fe„) 



where y^ = y/\\y\\. Part (b) now fohows immediately from (CO') and (C2). 

Part (c) with G = F is covered by Theorem 2.3 of [33]. To show the case 
G = F^, first we note that PD'^{F^) is nonempty for sufficiently small e > 
since by (b) for any (9 E M"' with PD{9,F) = a*, PD{e,Fe) > a for smah 
e>0. 

We now show that {y : PD{y, F^) = a} C dPD°^{Fg), the boundary of 
PD"{F^). Let PD{y,F£) =a. Such y exists since (i) by (CO') we can show 
that PD{z,Fs) ^ as \\z\\ oo (see Theorem 2.1 of [33]) and (ii) PD{-,Fe) 
is Lipschitz continuous by (a) and PD{6,Fe) > a for sufficiently small e > 0. 
Assume that y ^ dPD°'{Fi;); that is, y is an interior point of PD"{F^). Then 
there is a small ball centered at y with radius r and contained in the interior 
of PD°^{F^). By the scale equivariance of /i we see immediately that there 
is a direction uq such that 

{u'^y- ^^{Feuo))/<^{Feuo)>0{y,Fe)-r/ sup a{Feu) 

\\u\\=l 

for sufficiently small e such that sup||„||=;^ cr(F£„) < oo. On the other hand, 
we see that y' = y + u^r £ PD"{Fs) and 

oiv'M > "'^'t;'^-"' = "'"t;'^-"' + -f - > o(,.F.). 

But this implies that PD{y',F^) < PD(y,F^) = a, which is a contradiction. 

We now show that dPD"'{F^) C {y:PD{y,F^) = a}. Let y £ dPD°'{F^). 
Then by the continuity of PD{-,Fs) for sufficiently small e > 0, we conclude 
that PD{y,F^) > a. If PD{y,F^) > a, then by the continuity of PD{-,F^) 
there is a small ball B{y) centered at y with PD{z,Ff,) > a for all z € B[y) 
for sufficiently small e > 0. But this contradicts the assumption that y G 
dPD'^{Fe). Part (c) follows. 
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By part (a), for the given <r] < a* — a and any y £ PD°''^'^^'^[F^) 
PD{y, F) > PD{y, F,) - r//2 > a + r?/2 - 77/2 = a 
for sufficiently small e > 0. Likewise, for any y G PD"~^^{F), by part (a) 

PD{y, Fe) > PD{y, F) - ri/2 > a + i] - rj/2 = a + rj/2 
for sufficiently small e > 0. Part (d) follows immediately. □ 

Lemma A. 2. Under (C1)-(C2), supy^n^i |i?"(u,F(e,5^)) -i?"(u,F)| = 

Proof. By (CI), for any x G M'^ there is a unit vector v{x) such that 
g{x,v{x), F) = 0{x,F), where g{x,-,F) is defined before (2). Let v{u) := 
v{R{u)u)). By (CI) [hence (CO)] and Lemma A.l, 0{R{u)u,F) = /3(q). Thus 

R{u)uv{u) = f3{a)a{Fy(^u)) + K^viu)), R{u)uv < f3{a)a{Fy) + fi{Fi,) 

for any v G S'^~^. Likewise, for ^^e(^i) := v{R°'{u, F{e,6x))u) and small e > 

g{R''{u,F{e,6x))u,v,{u),F{e,6x)) =0{R''{u,F{e,d.,))u,F{e,6.,)) = p(a). 

Again for convenience, write Re{u) or R'^{u) for R"{u,F{e,6x))- Hence we 
have 

Rs{u)u've{u) = /?(a)o-(F£^^(„)) + n{F^v,{u)), 
Re{u)u'v < (3{a)a{Fey) + fi{Fey) 
for any unit vector v G 5"^"^. These and the counterparts above yield 

<Reiu)-R{u) 

< (/?(a)[f7(-Pe„(„)) - o-(F„(„))] + (^(Fei,(„)) - //(F„(„))))/n'z;(n). 

If we can show that both inf||„|| (n)| and inf||u||=i \u'vs{u)\ are bounded 
away from 0, the desired result follows immediately from (C2). 

Since PD°'{F) is assumed to contain the origin, the deepest point, thus 
0(0, F) = /3(a*) < (3{a) = 0{R{u)u,F) for any \\u\\ = 1 (Theorem 2.3 of 
[33]). Hence 

/3(a) = ^ — > 0(0, F) + 7(a, a ) 

= sup — — + 7(a,a ) 
||m||=1 (^[ru) 
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for 7(0, a*) := (/3(a) — /3(a*))/2 and any u ^ S'^ ^, which, in turn, imphes 
that 

u'v{u)R{u) ^ Mi^^M^^;^^ , , 

— — > sup — — - + — -^ + 7(a,a )>l{a,a ). 

\" v{u) ) 

Note that PD°'{F) is bounded with a nonempty interior (see Theorem 2.3 
of [33]). Hence < R{u) < 00 uniformly in u. Now we have 

inf \u'v{u)\ > (infcr(F„)(7(Q,a*))j/ sup R{u)>0. 
Ihll=i J ' ||m||=i 

The argument for showing inf||u||=i \u'vi;{u)\ > is the same. First we have 

G PD("+^)(F) C PD("+^/2)(Fe) C PD^iFs), 
by Lemma A.l for some < 6 <a* — a and sufficiently small e, which gives 
0(0, F,) < 0(i?("+'5/2)(^^ ^ OiR'^iu, F,)u, F,) = /3(a), 

uniformly in the unit vector u for sufficiently small e > by (c) of Lemma A.l. 
Now treating 0(0, Fg) as the /3(a*) above, we have the desired result by 
virtue of (C1)-(C2), (CO'), Lemma A.l and Theorem 2.3 of [33]. □ 

Lemma A. 3. (a) R"'{u,F) is continuous in u if (CO) holds, (b) R°'{u,Fs 
is continuous in u for sufficiently small e > if (CO) and (C2) hold. 

Proof, (a) Suppose that R"{u,F) is not continuous in u. Then there 
is a sequence Um uo such that limsup^_,^ R'^{um, F) / R'^{uo,F). By 
the boundedness of PD"{F), there is a subsequence of Um such that 
Urrik ~^ ^0 and liuif^^ca {urrn, , F) = Pq 7^ R^{uq,F). Note that Pq must 
be less than R"{uo,F) since otherwise we have by the uniform continuity of 
PD{x,F) in X that 

hm PZ)(P-(n^,, P)u^,, P) = PD{R^uo, F) = a = PD{R''{uo, F)uo,F), 

fc— »oo 

which contradicts the definition of P" {uq , P) . Thus Pq < P" {uq , P) . The 
quasi-concavity of PD{-,F) (Theorem 2.1 of [33]) implies PD{x,F) =a for 
any point x £ [RqUq, R°'{uo, F)uo], and further all such points x are bound- 
ary points of PD°^{F) in light of Lemma A.l. 

Let Nq{£q) be a small ball centered at the deepest point, the origin, 
and contained in PD'^{F). Let Nxq{£i) be a small ball centered at xq G 
[RqUq,R"{uq,F)uq\ and ei small enough such that the ray stemming from 
R'^{uo, F)uo and passing through the ball Nxq{£i) always passes through the 
ball A''o(eo)- But then there is a point yo G Nx^iei) and yo ^ PD°^{F) and a 
point yi G A''o(eo) such that 

yo = Xyi + (1 - X)R"{uo,F)uo, PD{yo,F) < a, 
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for some < A < 1, which contradicts the quasi-concavity of PD{-,F). 

(b) From the proof of Lemma A.l we see that (CO') holds for sufficiently 
small e > by virtue of (CO) and (C2). Then the quasi-concavity of PD{-,Fs) 
follows: 

PD{Xx + (1 - X)y, Fe) > mm{PD{x, F,)PD{y, F,)} 

for any < A < 1 and sufficiently small e > 0. Now invoking Lemma A.l and 
the arguments utilized in (a) we can complete the proof. □ 

We now prove Theorem 1. Following the proof of Lemma A. 2, we have 
(/3(a)[o-(Fe„^(„)) - o-(F^^(„))] + {li{F^^,{u))- lJ'{Fv,{u))))/u've{u) 
<Re{u)-R{u) 

< (/?(a)[o-(Fe^,(„)) - cj(F^(„))] + (/i(F£i,(„)) - fi{F^(^u))))/u'v{u), 

and inf||„||^]^ |ti'?;(n)| and inf||„||^]^ |u'fe(ti)| are bounded below from for 
sufficiently small e > 0. The desired result then follows from (C3) and the 
continuity of IF{v{u)' x;a,F^,(^y^) and /F('y(n)'x; /i, F^,(^j)) in v{u) for u&A, 
provided that we can show further that Ve{u) — > v{u) uniformly in as e — > 0. 

We first show that Ve{u) — > v{u) as e — > for a fixed u. If it is not true, then 
there are a sequence — > and a small > such that ||t'£„(u) — f (n)|| >rj 
for n > 1. By the compactness of S'^~^^ there is a subsequence of Vi;^{u), 
denoted still by fe„(ti) for simplicity, that converges to vq G S'^~^ . Observe 
that 

(13) 0(i?"(^z,F,Jn,F,J = (t.V„(n)i?"(tx,F,J - /i(F,„.,„(„)))/^(i^.„..„ (.))• 

Following the proof of Theorem 2.2 of [33] and by (CI) and (C2), we have (i) 
the Lipschitz continuity of 0(-, F^^) for small e > and (ii) for large Af > 

(14) sup |O(y,F,J-O(y,F)H0 asn^oo. 

\\y\\<M 

These, together with (13), Lemmas A.l and A. 2 and (C1)-(C2), yield 
u'voR''{u,F)-ti{F,,) u'v{u)R-{u,F)-^i{F,^^,)) 

— — =U[R [U,i<)U,i<) = — r . 

Uniqueness of v{u) = v{y) for y = R"{u,F)u implies that v{u) = vq, which, 
however, contradicts \\vo — v{u)\\ > r]. Hence Vs{u) — > v{u) for any fixed u G 

With the same argument, we can show that the convergence is uniform in 
the unit vector u, since otherwise there are a sequence Un S 5"^~^, a sequence 
£n (Cn i as n — > oo) and some small r/ > such that \vs„{un) — v{un)\ > rj as 
oo. By the compactness of S'^"^, there is a subsequence of Un such 
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that n„,„ ^Uo£ S"^~\ v{un^) ^ vq £ S"^~^ and Ve„^ {un„J ^ vi £ S"^~^ as 
m — > oo. We then can show that f o = f i = v{uq) (here we need Lemma A. 3). 
But this contradicts \ve„{un^) — v{un^)\ > as m — > oo. The desired result 
follows. □ 

Proof of Corollary 1. We first verify the conditions in Theorem 1. 
We see that n{Fy_) (= 0) and cr(F„) (= ^/vfEumo > 0) are continuous in u £ 
S'^-^ and 

n{Fu{e,5^)) = Med{^/u'T.uFz^{as),u'x,Vu'^uFz'^{be)}, 

a{F^{e, 6,)) = Med{ A^F^^^a,), \u'x - fi{F,^)\,V^Fz^\b,)}, 

where = (l-2e)/(2(l-e)), be = l/(2(l-e)) and Ze = \Z - fi{Feu)/Vu'T.u\. 
It follows that both fi{Fu{e,5x)) and a{Fu{s,Sx)) are continuous in n G S'^~^ 
for sufficiently small e > 0. Thus (CI) holds. The last two displays also lead 
to (C2): 

sup \n{Fu{e,6x)) - lJ.{Fu)\ = Ox{l), sup \cr{Fu{e,6x)) - cr{Fu)\ = Ox{l). 

||n||=l ||n||=l 

Note that for any y £ dPD°^{F) it can be seen that v{y) = , 
0{y,F) = p:-^/'^y\\/mQ = p{a) and PD{y,F) = {I + 0{y,F))-^ = a. Hence 
U{y) = {v{y) : g{y,v{y),F) = 0{y,F)} is a singleton for any y £ dPD°'{F). 

It can be shown that for any u £ S'^~^ 

(15) IF{ux;fi,Fu) = \/ii'i;nsign(ux)/(2/i^(0)), 

(16) IF{u'x;a,Fu) = V u'T,usign{\u'x\ — Vu'T,umo)/{'ihz{mQ)). 

These and the expressions for fi{Fu{e,6x)) and a{Fu{e,5x)) above lead to 
(C3). 

Obviously, both IF{u'x;^,Fy) and IF{u'x;a,Fu) are continuous in u£ 
S"^"^ if x = 0. When x j^O, IF{v{y)'x; /i,F„(y)) is continuous in v{y) for any 
y £A* dPD°'{F) with dPD°'{F) - A* = {y:y'J:-^x = 0,y £ dPD°'{F)} 
and P({y ly'S-^x = 0,y£ dPD°'{F)]) = for fixed x £ W^. Likewise, when 
a; 7^ we see that IF{v{y)'x : o", F^(^y^) is continuous in v{y) for any y £ A** C 
dPD'^{F) with dPD%F) - A** = {y:y'^~^x = ±/3(a)m§,y G dPD'^{F)}. 
The latter set is empty if < ttiq. Also P{{y :y'T,~^x = ±f3{a)mQ, 

y £ dPD°'{F)}) = for fixed x £ M"^. Thus there is a set ^ C S'^'^ with 
P{y:v{y) £ S'^-'^ - A,y£dPD'^{F)} =0 such that IF{v{u)' x; ^i,F^^^)) and 
IF{v{uyx;a, Fy(^u^) with v{u) = v{R'^{u,F)u) are continuous in v{u) for all 
u£A. Here A = S"^"^ if x = and A = S'^'^ - {u : u'T^'^x = 0} U {n : u'T^'^x = 
±||S-i/2^||mo} if x/O. 

Invoking Theorem 1 and (15) and (16), we have the desired result. □ 

Proof of Theorem 2. To prove the theorem, we need the following 
lemma. 
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Lemma A. 4. Under (CI) and (C3)~(C5), we have 
{PD{y,F{e,6x)) - PD{y,F))/e = h{x,y) + Ox{l) uniformly iny^B. 

(17) 

Proof. By the conditions and the proof of Lemma 6.1 of [36], we have 
for u{y,T) = {u: \\u\\ = 1, ||u — v{y)\\ < r}, r > 0, 

inf {-g{y,u,Fs) + g{y,u,F)) 

U&L(y,T) 

< -0{y,F,) + 0{y,F) < -g{y,v{y),F,) + g{y,v{y),F), 

for y £ B. The given conditions on /i and a imply (CO). Hence PD°'~^(F) 
is bounded (Theorem 2.3 of [33]). Conditions (C3) and (C5) imply that 
f^iFeu) — *■ cy{Fu) uniformly for u G {v{y) : y G B}. These and (C3) yield 

-9{y,u,Fe)+ g{y,u,F) 

Oa=(l) 



e{l + 0{y,F)) 

_ g{y,u,F)IF{u'x;a,Fu) + IF{u'x;n,Fu) 



a{Fu){l + 0{y,F)) 
uniformly in y G M*^ and in u G S'^~^ . Hence we have 

. . 9{y, u, F)IF{u'x; a, F^) + IFju'x; ^, F„) 
uiuiy^r) a{F^){l + 0{y,F)){l + 0{y,F,)) ^""'^ ^ 

^ PD{y,F,)-PD{y,F) 

~ £ 

9{y,v{y),F)IF{v{yyx;a,F^,(^y)) + IF{v{yyx; ^i,F^(^y)) 

a{F,^y^){l + 0{y,F)){l + 0{y,F,)) ' 

uniformly in y over B. Let r = e/2. By the given conditions, the result 
follows. □ 

We now prove Theorem 2 based on the lemma. First we can write for 
fixed a, 

(18) PTM(F) r.^,,^p._ ^PDnF.){y-PTM{F)MPD{y,F,))dFM 
(18) PTM^F,)-PTMiF)- S,,.^,^^^iPDiy,F,)) dF^ ' 

The denominator can be written as 

(l-e)! I{y e PD''{F,))w{PD{y,F,))dF{y) 
+ £l{PD{x, Fe) > a)w{PD{x, F,)), 
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which, with Lemma A.l and Lebesgue's dominated convergence theorem, 
yields 

(19) / w{PD{y,F,))dF,{y)= f w{PD{y,F)) dF{y) + o^{l). 
The numerator of (18) can be decomposed into three parts, 

he= f y*w{PD{y,F,))dF,{y)- f y*w{PD{y, F,)) dFM, 

JPD°'(F^) JPD°'{F) 

he= I y*w{PD{y,F,))dFM- f y*w{PD{y,F))dFM, 

JPD°'(F) JPD"{F) 

he= I y*wiPD{y,F))dFM, 

JPD"(F) 

where y* = y — PTM{F). It fohows immediately that 

(20) l3e/e = I{PD{x,F)>a){x-PTM{F))w{PD{x,F)). 

(CI) implies (CO). This, the continuity of w^^\-) and Lemma A.l yield 

(21) w{PD{y,F,)) - w{PD{y,F)) = {w'^^\PD{y,F)) + o^{l))H{y,F,), 

uniformly in y for the given x, where H{y,Fi;) = PD{y,F£) — PD{y,F). 
By Lemma A. 4, (C5) and the boundedness of PD°'{F), it is seen that 
(1 + \\y\\)IF{x; PD{y, F), F) is bounded uniformly in y for y € B and the 
given X € W^. This, together with (21), Lemma A. 4 and the boundedness of 
PD°'{F), immediately gives 

(22) l2e/e= f {y-PTM{F))w'^^\PD{y,F))h{x,y)dF{y)+o^{l). 

JPD°'{F) 

Write A(y,e,a) for I{PD{y,Fe) > a)) - I{PD{y,F) > a)). By virtue of 
(21), Lemmas A.l and A.4, the boundedness of PD^'iF) and PD^'iFe) for 
small e > 0, and the argument used in the denominator of (18), we have 



hi, = -J A{y,E,a)y*w{PD{y,F))dF{y) 



A{y,e,a)y*w{PD{y,F))dF{y) 

+ / A(y,e,Q)y '{PD[y,F)) dF{y) + Ox{l). 

Call the last three (integral) terms /igj, i = 1,2,3, respectively. Then by 
Lemma A.l, (C3), (C5), the boundedness of PD°'{F) and PD'^{Fe) for smah 
e > 0, the condition on w and Lebesgue's dominated convergence theorem 
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Ae2 = Ox(l) and /i£3 = Ox{l)- For Ii^i, by the mean value theorem and 
Theorem 1 we have 

rR°'{u,F,) -1 
/ {ru- PTM{F))w{PD{ru,F))\J{u,r)\f{ru)dr du 

Ir"{u,F) 



miu)u-PTM{F))w{PDie,{u)u,F))\J{u,0,{u))\f{e,{u)u) 

Sd-i 

X (/F(x;ii°(n,F),F)+o,(l)))dn, 

where Oe{u) is a point between R"{u,Fi;) and R°'{u,F) and Ox-(l) is in the 
uniform sense with respect to u. By Lemmas A.l and A. 2, (C5), the condi- 
tions on / and w, the structure of J{u,r), the boundedness of PD"{F) and 
PD'^{F^) for small e > and Lebesgue's dominated convergence theorem, 
we have for R*{u) = R{u)u - PTM, 

-Ii, = f R*(u)w(PD(R(u)u,F))\J(u,R(u))\ 
e Js-i-^ 

X f{R{u)u)IF{x; Riu),F) du + o^(l). 

The desired result now follows immediately from this, (19), (20) and (22). 
□ 



Proof of Theorem 3. Write u'X"' for F„„ and X"' for F„ and skip 
the d = l case. 

Consider the case d > 1. We first show m = [{n — d + l)/2\ contami- 
nating points are enough to break down PTM°'. Move m points of to 
the same site y. Denote the resulting data X^ = {Zi, . . . ,Zn}- Assume the 
first m points Zi {1 <i< m) are at site y far away from the cloud X". For 
u G 5*^"^, the projected data set (to direction u) is {u'Zi, . . . , u'Zn}. Since 
m+ [(n + d + 2)/2j > n, thus \u'Zi - n{u'Xj;^)\/a{u'X:;^) <2 for all 1 < i < 
m. This implies that 0(Zi,X^) < 2 for alll < i < m. Hence Zi e PL'"(X^) 
for 1 < i < m by (11). Since \\J2ZiZiw{PD{Zi,X^))\\ ^ oo as ||y|| oo, 
therefore PTM°'{X^) breaks down. 

Now we show that m = [(n — d + l)/2j — 1 contaminating points are not 
enough to break down PTM°' . Again let X^^ = {Zi, . . . , Zn} be any con- 
taminated data set. Since m < [{n + l)/2j and m + [(n + d + 2)/2j < n, 
sup||„||=i /i(ti'X^) < oo and sup||„|j=i (j(u'X^) < oo uniformly for any con- 
taminated data X^ with m original points contaminated. Hence 0{y, X^) — > 
oo as ||y|| oo. That is, y ^ P£)°(X^) when \\y\\ becomes very large. So 
PTM°' will not break down unless PD°'{X^) n becomes empty. We now 
show that the latter cannot happen. 

Denote = ^{u'X^), au = cr(ti'X^) and = [{n + d+2)/2\ . Let {u'Zn — 
A*m| < • • • < \u'Zin^ — fiu\ < • • • < \u'Zin — with the understanding that fiu, 
au and Zij depend on X^ and u for all I < j <n. Since m + d + 1 < rig-, 
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hence among Zji, . . . , there are at least d+ 1 origmal points from X". 
Therefore 

a{u'X;^)>, inf max |n'(Xi,-XOI/2 

il,...,id+i l<k,l<{d+l) 

for any and u. Here r arbitrary distinct integers from 

{l,...,n}. 

Clearly, there are at least niQ = + 2)/2j + 1 original points, say (with- 
out loss of generality) Xi, 1 <i < tuq, uncontaminated. Then it is not diffi- 
cult to see that 

|n'Xj — /_i(n'X^)| < max max \u {Xij^ — X^)\, 1 < « < ?7iO) 

«iv.«(mo-i) l<fc,K(mo-l) 

for any X'!^^ and u. This, in conjunction with the last display, immediately 
yields 

^n,^ max,,,...,i maxi<fc^K(mo-i)l^'(^i. 

0{Xi;X„)< sup — — — „ , 

||«||=i mfii,...,id+i maxi<fc^K(rf+i) \u'{Xi^^ - XjJ|/2 

for 1 < i < mo- Hence G PD'^{X^) for all 1 < i < mo for any < a < ad- 
That is, PZ)"(X^) n is not empty. We complete the proof. □ 

Proof of Theorem 4. The following lemma, an analogue of Lemma A.l, 
is needed in the sequel. It can be proved in much the same way as Lemma A.l. 



Lemma A.S. Under (CO) and (C2') for G = F and Fn and very large 

n: 

(a) suv^^^APD{x,Fn) - PD{x,F)\ = o{l), a.s., 

(b) PD{x,G) is Lipschitz continuous in x ^ M."^, a.s., 

(c) dPD''{G) = {x:PD{x,G)=a}, a.s., 

(d) PD^''+'^\F) C PP)(°+''/2)(F„) C PD°(P) a.s. for any < ri < a* - a. 

Now we prove the theorem. Condition (Cl') implies (CO). By Lemma A.S, 
PD°'{Fn) is nonempty and contains the origin a.s. for large n. Hence R'^{u, Fn) 
is well defined a.s. Condition (Cl') also implies that there is a unit vector 
v{x) such that g{x, v{x),F) = 0{x, F) for any x G W^. Let v{u) := v{R{u)u)). 
Likewise, we have a unit vector Vn{u) := v{Rn{u)u). By virtue of Lemma A.S, 
0{R{u)u, F) = 0{Rn{u)u,Fn) = /3(a) a.s. for sufficiently large n. Hence for 
any v € S'^~^ 



R{u)uv{u) = f3{a)a{F^(^u)) + K^viu)), R{u)u'v < P{a)a{F^) + fi{Fy). 
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Likewise, we can have the same displays for Rn{u), Vn{u) and v. These give 
(/3(a)[o-(i^„„„(„)) - o-(F„„(„))] + - ii{F^^i^^))))/u'vn{u) 

< Rn{u) - R{u) 

< (/3(a)[(T(i^„^,(„)) - cj(F„(„))] + - /i(F„(„))))/u'u(n). 

If we can show that inf^g^d-i \u'v{u)\ > and inf^g^d-i \u'vn{u)\ > al- 
most surely for large n, then the theorem follows in a straightforward fashion 
from (C2'). 

The proof for inf„g_5d-i \u'v{u)\ > is given in the proof of Lemma A. 2. 
The argument for proving inf^g^d-i \u'vn{u)\ > a.s. for sufficiently large n 
is the same. But we need the following two almost sure results for sufficiently 
large n: 

(CO") 

and O{0,Fn) < 0{Rn{u)u,Fn) a.s. The first one (CO") follows from (CO) 
and (C2'). The second one follows from Lemma A. 5 since the origin is an 
interior point of PD°'+\F) C PD°'{Fn) a.s. for some < 5 < a* - a and 
sufficiently large n. □ 

Proof of Theorem 5. The following lemma about the continuity of 
R°'{u,F) in n G S'^"^ is needed in the proof of the theorem. 

Lemma A. 6. Under (CO) and (C2'), R°'{u,Fn) is continuous in u for 
large n. 

We now prove Theorem 5. Following the proof of Theorem 4, we have 
(/3(a)[o-(-F„t,„(„)) - cj(F^„(„))] + (//(F„^„(„)) - /i(F^„(„))))/u'vn(u) 
<Rn{u)-R{u) 

< (/3(a)[cj(F„^(„)) - cj(F^(„))] + (/i(F„^(„)) - fi{F^(_u))))/u'v{u), 

and inf„g^d_i u'v{u) > and inf„ggd-i u'vn{u) > almost surely for n large. 

By the compactness of S"*^"^, the continuity in (CI') is uniform in u G 
gd-i^ This, in conjunction with the last display, (C3') and standard re- 
sults on empirical processes (see, e.g.. Problem ILIS, Approximation Lemma 
IL25, Lemma ILSG, Equicontinuity Lemma VIL15, and (the central limit 
theorem for empirical processes) Theorem VIL21 of [20], or see [32]), gives 
the desired results if we can show that Vn{u) — > v{u) uniformly in the unit 
vector u as n ^ oo. The latter can be done in much the same way as the 
uniform convergence of v^{u) v{u) as e ^ in the proof of Theorem 1. 
□ 
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Proof of Theorem 6. The desired result follows if we show that 
the numerator and the denominator of PTM{Fn) converge a.s. to those of 
PTM(F), respectively. Clearly it suffices to treat just the numerator. The 
given conditions imply (CO), which, combining with (C2'), Lemma A.S and 
the continuity of 'w^^\ yields 

w{PD{x, Fn)) = w{PD{x, F)) + o{l) a.s. and uniformly in x G M'^. 

(23) 

The boundedness of PD" (F) for any q' > (see Theorem 2.3 of [33]) and 
Lemma A. 5 imply the almost sure boundedness of PD"{Fn) for sufficiently 
large n. This, together with (23), implies that 

/ w{PD{x,Fn))xdFnix) 
J PD{x,Fn)>a 

= [ w{PD{x,F))xdFn{x)+o{l) a.s. 

J PD(x,F„)>a 

The desired result follows if we can show that 

w{PD{x,F))xdFr,{x) 

PD{x,Fn)>a 



[ w{PD{x,F))xdF{x) = o{l) a.s. 

JpD(x.F)>a 



iPD{x,F)>a 

In light of the (a.s.) compactness of PD'^{F) and PD"{Fn), Lemma A.S and 
Lebesgue's dominated convergence theorem, we see that 



a.s. 



a.s. 



J [I{PD{x, Fn) > a) - I{PD{x, F) > a)]w{PD{x, F))x dF{x) = o(l 
Thus we only need to show that 

(24) J I{PD{x,Fn)>a)w{PD{x,F))xd{Fn-F){x)=o{l) 

Let 5 S (0,a). Then PD^^^ {F) is convex and compact (Theorem 2.3 of 
[33]). By Lemma A.S, PD"'{Fn) C PD°'-'\F) a.s. for sufficiently large n. This 
and the convexity of 0{-,Fn) imply that PD°'{Fn) is convex and compact 
and contained in PD'^~^{F) a.s. Define C = {C: C C PD'^-\F) is compact 
and convex}. Then PD"(Fn) £ C a.s. for sufficiently large n. By a well-known 
result of Ranga Rao ([22], Theorem 4.2), C is an F-Glivenko-Cantelli class 
(see [32], for the corresponding definition and related discussion) and (24) 
follows from the boundedness of w and PD"^{Fn) (a.s.). □ 

Proof of Theorem 7. The following representation of Hn{x) := -y/n x 
{PD{x, Fn) — PD{x,F)), established in Lemma S.2 of [3S], is needed. 
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Lemma A. 7. Let fJ,{Fu) and a{Fu) be continuous in u and (t{Fu) > 
for u G S*^"^. Then under (C3') and (C4) we have for f„ = y/n{Fn — F) 

■\/n{PD{x, Fn) — PD{x, F)) = j h{y,x)i'n{dy)+Op{l) uniformly in x ^ B 

with h{y,x) = {0{x,F)f2{y,u{x)) + fi{y,u{x)))/{a{F^^,)){l + 0{x,F)f). 

We now prove Theorem 7. First we note that 

- PTM{F))wiPDix,Fn))dF^{x) 



PTM{Fn) - PTM{F) 



IPD-(F^) W{PD{X, Fn)) dFn{x) 

Following the proof of Theorem 6, we can see immediately that 

w{PD{x,Fn))dFn{x) 



(25) 



PD°'(Fn) 



: / w{PD{x,F))dF{x) + o{l) a.s. 



'(F) 

Decompose the numerator of PTM(Fn) — PTM{F) into three parts, 

hn= I X*W{PD{x,Fn))dFn{x) 

JPD°'{Fn) 

X*W{PD{x,Fn))dFn{x), 

PD°'{F) 

I2n= I X*W{PD{x,Fn))dFn{x) 



PD^iF) 

x*W{PD{x,F))dFn{x), 

PD°'{F) 

hn= f x*W{PD{x,F))d{F^{x)-F{x)), 

JPD°'{F) 

where x* = x- PTM{F). Obviously 

(26) v^/3n = j {x - PTM{F))W{PD{x,F))I{x € PD''{F))dun{x). 

We now work on l2n- By (C3') and the central limit theorem for empirical 
processes (see, e.g., Theorem VII. 21 of [20]) we have that 

sup \n{Fnu) - fJ'iFu)\=Op{l/y/n), 

\u\\=l 

(27) 

sup \aiFnu) - a{Fu)\ = Op(l/V^), 

|n||=l 
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which then imply (see Theorem 2.2 and Remark 2.5 of [33]) that 
(28) sup(l + ||x||)|PZ?(x,F„)-PD(x,F)| =Op(l/V^). 



By the continuous differentiabihty of w on [0, 1] and Lemma A. 5, we have 
that 

w{PD{x,Fn)) - w{PD{x,F)) 

(29) 

= {w^^\PD{x,F)) + o{l))Hn{x)/,/^ a.s. 
uniformly in x G W^. This, the boundedness of PD°'{F) and (28) imply 

(30) ^^hn = I {x-PTM{F))w^^\PD{x,F))H^{x)dFn{x) + Op{l). 

JPD°^{F) 
We now show that 

(31) V^il2n= [ {x-PTM{F))w^^\PD{x,F))Hn{x)dF{x)+Op{l). 

JPD°'{F) 

Clearly, we can view Hn{-) for every n as a map into /°°(M'^), the space of 
all uniformly bounded, real functions on W^. By Lemma A. 7 (and its proof; 
see [35]) and Theorem 1.5.4 of [32], we see that Hn is asymptotically tight 
on B (the set defined in the theorem). Consequently, for every e > there 
are finitely many continuous functions hi,. . . ,hk on B such that 



limsupP< min \\Hn — hi\\oo > e > < e. 

n—i-oo Ll<«<fc J 

Since the functions I{PD{x,F) > a){x - PTM{F))w^^'^ {PD{x, F))hi{x) are 
bounded and continuous for x E {PD(x,F) > a}, hence 

/ (x - PTM{F))w^^\PD{x,F))H^{x) d{Fn - F)(a 



< max 

l<i<fc 



(x - PTM{F))w^^\PD{x,F))hi{x) d{Fn - F){x) 

PD"{F) 



+ (2e) sup \\{x- PTM{F))w^^\PD{x,F))\\ 

x£PD°'(F) 

= 0(e) + 0(1) 

with asymptotic probability not less than 1 — e. Thus we obtain (31), which, 
in conjunction with Lemma A. 7, (C3') and Fubini's theorem, gives 

V^l2n= HI {y - PTM{F))w^^\PD{y,F))h'{x,y)dF{y) 

(32) 

X dUn{x) +Op(l). 
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We now work on Let A„(x) = I{PD{x,Fn) > a) - I{PD{x,F) > a). 
By (28) and (29) and the boundedness of PD'^{F) and PD°'{Fn) (a.s. for 
sufficiently large n) (see Lemma A. 5), we have 

V^hn = V^J An{x)ix - PTM{F))w{PD{x,F))dFn{x) 

+ 1 An{x)ix- PTM{F))w^^\PD{x,F))Hn{x)dFnix) + Op{l), 

for sufficiently large n. Call two terms on the right-hand side Iini and Iin2, 
respectively. 

We first show that /i„2 = Op(l). Observe that by (28) we have 
||/i„,2|| < |Op(l)| / \\x - PTM{F)\\\An{x)w^^HPD{x,F))\dFn{x). 



Invoking the Skorohod (representation) theorem, we assume that Yn and 
Y are defined on the probability space {Q,T,P) such that Yn — Y = o(l) 
a.s. (P) and Fy„ = Fn and Fy = F. By changing the variables in the above 
integral, we have 

\x*\\\An{x)w^^\PD{x,F))\dFn{x) 

= I \\Y:\\\An{Yn)w'^^\PD{Yn,F))\dP, 

Jn 

where Y* = Y^ — PTM{F) and A„(l^) a.s. by Lemma A. 5. This, Lemma A. 5 
and Lebesgue's dominated convergence theorem yield immediately /i„2 = 

Op(l). 

We now show that 
(33) hni = V^j An{x)ix- PTM{F))w{PD{x,F))dF{x)+Op{l). 

This can be accomplished by utilizing the results of an F-Donsker class 
of functions and the asymptotic equicontinuity (see [32]) and the fact 
J{I{PD{x,Fn)>a) - I{PD{x,F) > a)fdF{x)^0. Observe that 



v/n J An{x){x - PTM{F))w{PD{x,F))dF{x) 



{ru - PTM{F))w{PD{ru, F))f{ru)\J{u,r)\ dr 

R"{u,F) 



du. 



Let 6n{u) be a point in between i?"(n, Fn) and R"(n, F). Then by Theorem 4, 
J{u, 9n{u)) = J{u, R°'{u, F)) + o{l) a.s. uniformly in u E 5*^"^. By Theorem 4 
and Lemma A.S, w{PD{9n{u)u, F)) = w{PD{R'^{u, F)u, F)) + o(l) = w{a) + 
o(l) a.s. and uniformly in u E S'^~^. Finally by the continuity of / [in a 
small neighborhood of dPD°^{F)] and of R"{u,Fn) and R"{u,F) uniformly 
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in u (see Lemmas A. 3 and A. 6) for large n, the compactness of S"^~^ and 
Theorem 4, f{6n{u)u) = f{R"'{u,F)u) + o(l), a.s. uniformly in n G S"^'^ 
for large n. These, (33), the preceding display, the mean value theorem, 
the uniform continuity in u of w{PD{R"{u, F)u, F)) and J(u, R°'{u, F)), 
Theorems 4 and 5, and Fubini's theorem, yield 



nl 



{R*{u)w{a)\J{u, R{u))\f{R{u)u) + o{l))Kniu) du + Op(l) 



R*{u)w{a)\J{u,R{u))\f{R{F)u)k{x,R{u)u)du 



+ Op(l), 

for R*{u) = R{u)u-PTM{F), Kn{u) := y/^{R{u,Fn)- R{u,F)). This gives 



R*{u)w{a)\J{u, R{u))\f{R{u)u)k{x, R{u)u) du 



dVn{x) 



Igd-l 
+ Op(l), 

which, combining with (32), (26) and (25), gives the desired result. □ 
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